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, • «„ relates generally to data storage 
The present ^^\J a mirroring for failure 
systems, and specifically 
protection in storage systems. 

BACKGROUND OF THE INVENTION 

. ,uo is a standard part of all large-scale 
Data backup xs systems, as 

computer data storage medium , such as a 

well) . Data written "^T^^ are copied, or 
volume on a local storag her volum e on 

„ mirrored ,- to a backup medium ;^2 p y volume can then be 
. remote storage suhsys - * causes the data on the 

uae o for recovery » ^ ^ data mirroring 

primary medium to be los • enti tled -Seneca: 

ar e surveyed by Jl * -1 • • _ proceeding s of 08WIX 

Remote Mirroring Done . 2q()3) _ pagea 

Technicai conference (San Antonio Texas . ^ ^ 

253 . 268 , which is i"-^ d es he f r or remote m irroring must 

aut hors note that desrgn choice ^.^ as 
a ttempt to satisfy the competing go^ 

closely synchronized as po . subsy stem as 
writes by host processors to the 

little as possible. IBM Enterprise 

La rge-scale storage sys ems su h a ^ ^ _ 

(wc\ (IBM Corporation, «. 
5 Storage Server (MS ,1 ^ service £unctl ons 

typi cally offer a number o f d ^ these 

that can be used for in whlch a 

functions is peer-to-peer remote copy 
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„ f a source volume on a primary storage 
mi „or copy of a . ^^^^ subsyste ^ ^ 

subsystem re created on ^ ^ a ppRC volume 

: r ::::: l — rrr: 

primary subsystem. The contr communicati on link 

3ub system then sends the updates over ^ 
co the eeconda, ^ -n the _ ^ _ 

subsystem has placed tn The 
■ i orn r M e it acknowledges receipt 

— *- Signals the application that the 

TOit %r^rt:r-appiic a uons — ^ 

PPRL P luViU , fo-; lures, since all 

i t-* security against single-point failures, 

complete security y , =Hlp media m both 
d ata are written -c-n ou^ no^le » the other 

th e primary and — st «^ J ^ _ on 

h and, the need to save ^ ^ 

both BUhsystems be o- ^ ^ 

co nsidered "^J"" ^ large . sca le storage systems 

, host write operations. In ^ re<Juced 

su ch as the above-mentioned IBM ESS t high . speed , 

by initially writing data both to cache ^ 

no „-volatile media such subsyst£ms . 

me mory (RAM) . « "oth the prim y hronous ly (an 

;5 The d ata are — .copied^ ^ 

operation that is axe. larqe amount 
Id removed £ rom the non v ^ ~ "J^. p L P ose is 
of non-volatile memory that must be 



very costly. 
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SUMMARY OF THE INVENTION 

The present invention provides methods for data 
mirroring that can be used to create storage systems that 
are immune to single-point failures and have low-latency 
5 write response, without requiring special non-volatile 
memory or other costly components. In embodiments of the 
present invention, when a host writes data to a primary 
storage subsystem, the primary storage subsystem records the 
data in volatile cache memory, and transmits a copy of the 

10 data to the secondary storage subsystem. The secondary 
storage subsystem likewise writes the data to its cache, and 
sends an immediate acknowledgment to the primary storage 
subsystem. The primary storage subsystem then signals the 
host to acknowledge that the write operation has been 

15 completed, without waiting for the data to be written to the 
disk (or other non-volatile media) on either the primary or 
secondary storage subsystem. 

Both primary and secondary storage subsystems keep a 
record of the address ranges of data that the other 

2 0 subsystem has received in its cache, but may not yet have 
copied to non-volatile storage. In the event of a failure 
in one of the subsystems, this record indicates which data 
will have to be copied back to the failed subsystem during 
recovery (in addition to any new data that may have been 

25 written to the operating subsystem during the period of the 
failure) . From time to time, during normal operation, each 
subsystem informs the other of the address ranges that it 
has hardened, whereupon the other subsystem removes these 
ranges from its record. Thus, upon recovery from a failure, 
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the amount o f data that ^ - - ied .a. to the failed 
subsystem is li.it- to the a dr s ran.es ^ ^ 

record maintained by the non- failed system, 
needed for full recovery is not too long. 

Since data are recorded synchronously and 
Since nrimarv and secondary 

maintained symmetrically on both the primary a 
maintained yn second ary storage subsystem can take 

storage — m — atelY " 

the place of the P Y ^ 

case of a failure direc ted to either of 

Furthermore, read operations can be directed 

the storage subsystems at any time an 
There is therefore provided, m accordan 
There i& method for storing 

c nr pqent invention, a metnuu. 

e * 0dimen I t storage system that indu.es primary ana 
r storage slsystel, inducing respective first and 
secondary storage suosy r esoective first and 

second volatile cache memories and respect 
secona v method including: 

second non-volatile storage media, m subsystem 

receiving the data at the primary storage 

from a host processor,. ^ „ ^ 

writing the data to tne ehb 
the primary storage subsystem; subsys tem to 

copying the data from the primary storage y 
the secondary storage subsystem ^ 

writing the copied data to 

. the secondary storage subsystem; 
5 memory m the secona y sec0 ndary storage 

returning an acknowledgment from tne 

returning subsys tem responsively to 

subsystem to the primary storage 

^ ^t- a t-o the second volatile cacne 
writing the copied data to tne s 
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-v, ^i-* in the second non-volatile 
and prior to saving the data in 

storage media; been 

signaling the host processor that the 
stored in the data storage system responsively to the 
acknowledgment from the secondary storage ^ 

transferring the data in the primary and secondary 
transterr g volatile cache 

storage subsystems from the first ana s 

, rh* first and second non-volatile storage media, 
memories to the tirsc anu 

respectively. includes 
in some embodiments, copying the data 
., HM the data between mutually-remote sites over a 
transmitting the data Alte rnatively, the 

communication link between th^t.- „ lati le 
second volatile cache memory and the 

m «iia are located in mutually -remote sites, 
storage media are , ranS mitting the data from 

i-™sf erring the data includes transmitting 
transterrmy second non-volatile 

the second volatile cache memory to the 

storage media over a communication link between the it- 
TYP ically, copying the data includes creating - 
the secondary storage subsystem of the data received by 
on the ^ methQd may include , upon 

the primary -rage - J 
occurrence of a failure in r 

Hn „ rinq the secondary storage subsystem to serve a 
configuring th ^ ^ further data frora 

prl mary storage subsy ^ 

hVlp host processor to be scoreu 

the host p volatile cache memory is 

; in ° ne d e r: :i: ^ «~ ^ 

T olated >y a service provider o th er than an owner of 
: JZtls* suosvs t e m , and trans £ e r rin 9 tne data 
the primary ^ _ ,. hp second non- 

^ ,«i.Hle cache memory to tne secouu 
from the second volatile cauw 
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ma ^ a includes processing the data on a 
volatile storage media memoes y 

f pe-r>er- service basis. 

ree pex , . _ t-rsnqferrinq the data 

In disclosed embodiments, transferring 

» messaqe from the secondary storage 
includes sending a messaye i- 

subsystem indicating 
subsystem to the primary a t-v,p 

addresses of the data that have been transferred to the 
second non-volatile storage media, and the method further 
ncludes creating a record on the primary storage subsystem 

, t-hP data copied to the secondary storage 
of the addresses of the data copie 

« . .v.. T-(=cord in response to tne 
Qubsvstem, and updating the recora i 

m s a e Typically, the method includes, upon recovery of 
message. yp of the secondary storage 

the system from a failure J a ^ 

:rr :;"e .sru^ - - — 

clary storage subsystem. Updating the record may 
ln olude reding fro. the record the addresses of the d a 
tha t have been transferred to the second non-volatile 



storage media. 



ire e^odiment, creating the record includes marking 
re spective bits in a bitmap corresponding to addresses of 
h data copied to the secondary storage subsystem and 

a 4«r1udes clearing the respective bits, 
updating the record includes Cl ~ 9 ± tne data 

Additionally or alternatively, transferrx g 
Eludes transferring the data in a range <^-~ 

- - — d -T*T*:Z:22Z includes 
volatile storage media, and sending 

in£ orming the primary storage subsystem that the data n the 
range have been transferred, so that the primary storage 
^system updates the record with respect to the range. 
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one embodiment , transferring the data in the range includes 
destaging the range of the addresses. In another 
embodiment, creating the record includes maintaining a 
mir ror of the record on the secondary storage subsystem, and 
wherein transferring the data in the range includes 
selecting the data to be transferred from the second 
volatile cache memory to the second non-volatile storage 

media responsively to the mirror. 

^ 3 - hr^nqferrinq the data includes 

In other embodiments, transrerrmy 

sending a message from the primary storage subsystem to the 
secondary storage subsystem indicating addresses of the data 
that have been transferred to the first non-volatile storage 
medi a, and the method further includes creating » 
the secondary storage subsystem of the addresses of the data 
copied to the secondary storage subsystem, and updating the 
record in response to the message. 

There is also provided, in accordance with 
embodiment of the present invention, a data storage system, 

including: - f , 

a primary storage subsystem, which includes a first 
volatile cache memory and first non-volatile storage media; 

an<1 a secondary storage subsystem, which includes a second 
volatile cache memory and second non-volatile storage media, 
wherein the primary storage subsystem is arranged to 
receive data from a host processor, to write the data to the 
£irst volatile cache memory, to copy the data to the 
secondary storage subsystem, and to transfer the data from 
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, _ i-vio f irqt non-volatile 
the first volatile cache memory to the first n 

storage media, and 

wherein the secondary storage subsystem is arranged to 
re ceive and write the copied data to the second volatile 

to transfer the data from the first volatile 
cache memory, to transrer t« 

f« ^hP first non-volatile storage media, and to 
cache memory to the first non t 
return an acknowledgment to the primary storage subsystem 
• .lv to writing the copied data to the second 
responsively to writing transfe rring the data to 

volatile cache memory and prior to transferring 
the second non-volatile storage media, 

wherein the primary storage subsystem is arranged to 
signal the host processor that the data have been stored in 
the data storage system responsively to the acknowledgment 
from the secondary storage subsystem. 

There is additionally provided, in accordance with an 
e^odiment of the present invention, a compute, ^software 
a r for use in a data storage system including primary 

£irs t and second control unit. *™ £ ^ 

volatile cache memories, and respective first and 
non-volatile storage media, the product -™ 
computer-readable medium in which program --ructions are 
stored, which instructions, when read by the firs and 
sec ond control units, cause the first control un t to 
; recei ve data from a host processor, to * 
tirst volatile cache memory, tc . copy - ^ 

secondary storage subsystem, and to transfer 
the first volatile cache memory to the first no 

„ ■ »nd cause the second control unit to receive 
storage media, and cause 
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and write the copied data to the second volatile cache 
memory, to transfer the data fro* the first volatile cache 
memory to the first non-volatile storage media, and prior 
transferring the data to the second non-volatile storage 
media, to return an acknowledgment to the primary storage 
subsystem responsively to writing the copied data to the 

,7V^vo-in the instructions 
second volatile cache memory, wherein the 

further cause the first control unit to signal the host 
pressor that the data have been stored in the data sto age 
system responsively to the acknowledgment from the secondary 

storage subsystem. 

The present invention will be more fully understood 
£ rom the following detailed description of the embodiments 
thereof, taken together with the drawings in which: 

BRIEF DESCRIPTION OF THE DRAWINGS 
Fig 1 is a block diagram that schematically 
illustrates a data storage system, in accordance with an 
embodiment of the present invention; 

Fig j is a block diagram that schematically 
illustrates a storage subsystem, in accordance with an 
alternative embodiment of the present invention; 

Fig 3 is a schematic representation of bitmaps used in 
tracking' data storage, in accordance with an embodiment of 

the present invention; a 
Fig 4 is a flow chart that schematically illustrates a 
' m ethod for writing data to a data storage system, in 
accordance with an embodiment of the present invention; 
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Fig 5 is a flow chart that schematically illustrates a 
method for tracking data storage, in accordance with an 
embodiment of the present invention; and 

Fig 6 is a flow chart that schematically illustrates a 
method for tracking data storage, in accordance with another 
embodiment of the present invention. 
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DETAILED DESCRIPTION OF EMBODIMENTS 

Fig l is a block diagram that schematically 

on in accordance with an 
illustrates a data storage system 20, in accord 

- hhP cresent invention. System 20 comprises 
embodiment of the present in «=toraqe 

„ „ nri 24 which are labeled storage 
storage subsystems 22 and 24, 

^ c^or a ae node B» for convenience. In the 
node A" and storage noae o 

-v. - follows it is assumed that node A is 
^pc-rriDtion that follows, n- , 
descripc ^ anhivqtem while node B xs 

„ f . fflirpd as the primary storage subsystem, 

of data mirrormg. Thus, to w » hereina£C er 

on a host computer 26 ir eieiLCU 
system 20, a host P communication link 

simply as a "host") commumcates over a 

no Tvnicallv, lxnk 28 IS pan- 

28 with subsystem 22. Typxcaiiy, 

^work such as a storage area network (SAN) 

computer network, sucn a subsystem 22 

. -. v, ACr 96 mav communicate witn suubya 

Alternatively, host may rf 

over substantially any suitable type o lty , 

-^■n link Although for the saxe oj. v 
communication link. y 20 typi cally 

nnlv a single host is shown in Fig. l, sy 

only a sing m^-.nv in normal operation, 

serves multiple hosts. Typically, m 

n.ta only to primary storage subsystem 22, 
hosts may write data only to v * 

r^t-.a from either subsystem 22 or 24. 
but may read data fro ^ ^ 



Subsystems ^ « n in 

v, . aT1 H»llv any suitable type of storage 
substantially any ^ ^ device Qr 

the art, such as a storag Subsystems 22 and 24 

network- attached storage (HAS) device. 

n comprise computer workstations, which 
may even comprise P functions 

configured and .^^^ 24 ma y be collocated in 
described herein. Subsystems securi ty, they may 

a single facility or, for enhanced data security, 
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b e located at mutually-remote sites. Although system 20 is 
be iocateu ainale primary storage 

3 hown in Fig. 1 as comprising only a single p y 

^system and a single secondary storage 

principles of the present invention may be applied in a 

Lglforward manner to systems having ^ 
primary and/or secondary storage subsystems. For example 
the mlthods described hereinbelow may 
sys tem in which data written to a primary orage su y 
are mirrored on two different secondary storage su y 
, in order to protect against simultaneous failures 

different points. compr ises a control unit 

Parh of subsystems 22 and ^ colu i' 

-o^na one or more microprocessors, 
(CU) 30, typically comprising one or mo 

32 and non-volatile storage media 34. 
with a cacne u random-access memory 

5 Typically, cache 32 comprises volatile 

j;, ii romnrise a magnetic disK or 
(RAM) , while storage media 34 .compris 

rficv arrav Alternatively, other types of volatile 

disk array. *i cache and 

volatile media may be used to carry ou ^ 
storage functions of subsystems 2 » - herein under 
nw rarrv out the operations descnoea 
!0 control ^ software, which may be downloaded to 

2 and 24 in electronic form, over a network, for 
subsystems 22 and alterna tively or additionally, 

example, or may be provided, alternate y 

■wi m pdia such as CD-ROM. Subsystems 22 and 
on tangible media, sucn high-speed 
• are between themselves over a nign p 
25 — 3t ? 6 „ hich may be part of a S M or other 

communication be . dedicaCed line between 

network, or may •^™»« ^ be cQupled to 

C he two subsystems. other hoats (no t 

communicate with host 26, as well 
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.hown) over a communication link 38, similar to link 28. 
shown) , over a primary storage 

Link 38 enables subsystem 24 to serve f 

. f: failure in subsystem 22. 

t in 2 in 

illustrates a storage subsystem 40. in ^7^" 

-,•„.,„.■ of the present invention. Subsystem 
alternative embodiment of the p ^ 

40 may be used, for example, in place o 

, n , pia 1) . subsystem 40 is a sore ol 

24 ln SyStem 20 I' 9 ' " . local controller 42 and a remote 

storage node, made up of a local 

diak 46 controller 42 comprises CU 30 ana 

disK to- onhqvstems 22 and 24. 

, , lar to the CU and memory used in subsystems 
similar to the connec ted directly to CU 30, but 

Disk 46, however, is not connect , work 44 (For 

• CU via a networK <±<±. * 

instead communicates with the CU commun ication 

« rfisk 46 typically comprises a commumc 
this purpose, disk 46 yp figures .) In this 

controller, not shown m the f g 

p.. o n mav write and read data to 
configuration, CU 30 may wr ^ 

disk 46 using a suitable network protocol, 

is known in the art. advantageous in 

The configuration of subsystem 4U 

The conny .. . f the primary and 

th at it allows the control uni s of th P ^ 
secondary storage subsystems to be located at 
wh ile dis* 46 is located at a remote 

, li^tes rapid communication between the control 
facilitates rapi ^ uriting protocol 

(thus reducing the latency . 46 

hereinbelow) , while keeping backup data m disk 
5 described herembelo , ^ ^ primary 

at a safe distance in case or 

SitS ' roller 42 of subsystem 40 may be 

Alternatively, controller 42 o 

,i a riv secure conditions at a tirsu 
held in particularly secure 
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3i te not far from subsystem 22, while disk 46 of subsystem 
40 is farther away, at a second remote site. Bus 
arrangement is advantageous in that it maintains data 
security, without introducing long write latency due to 
iarge distance between the locations of the storage media in 
the primary and secondary subsystems. The first remote site 
My b e maintained by an outside service provider who 
provides secure storage on a f ee- P er-service basis to th 
owner of subsystem 22 and to other storage users thus 
relieving the storage users of the need to mam tarn more 
than two storage locations. The second remote site may 
m aintained by the outside service provider, as 

Fig . 3 schematically shows bitmaps 50 and 52, which are 
used by CU 30 in each of subsystems 22 and 24 for recording 
changes in the data stored by the other subsystem n 
accordance with an sediment of the present invent ». he 
ase of these bitmaps is described hereinbelow in detail with 

„a k RrieflV, each bitmap 50, 5^ 
reference to Figs. 4 and 5. Brieny, 

reference y corresponding to a storage 

comprises multiple bits b4, ectun r 
T , on disk 34 For example , each bit may correspond to 
element on disk 34 . ^ 

a different track on the disk, or to 

e of Physical addresses on the disk. Certain bits 56 
range of physical a bitmap) 
are marked by the CU (i.e., the bits are 

„ ,^ ,4 to indicate that data have 
in each of subsystems 22 and 24 to 

been written to the cache in the other subsystem prior to 
transfer of the data to the corresponding storage events 
„ th disk, alternatively, other types of data structures, 

I < „ the art may be' used for maintaining records 
as are known in the arc, may 

of the status of data in caches 32. 
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Each CU 30 subsequently clears the marked bits 56 in 
its bitmap 50 or 52 when the CU is notified, as described 
hereinbelow, that the data have been transferred from cache 
32 to disk 34 on the other storage subsystem. This process 
5 of transferring data from cache to disk may also be referred 
to as "hardening" or "destaging" the data. (In the context 
of the present patent application and in the claims, 
"hardening" refers to any transfer of data from cache to 
disk, while "destaging" is used in reference to a Mestage 

10 scan," whereby the CU performs an orderly transfer to disk 
of all the data in the cache or' in a range of the cache. In 
other words, the destage scan hardens all the data in a 
range of the cache or in the entire cache.) 

Although the process of transferring data to disk may 

15 be applied to the entire cache at once - whereupon the CU 
clears the entire bitmap when the process is completed - it 
may be more efficient to apply the process to smaller ranges 
of addresses (i.e., smaller groups of tracks or other 
storage elements) on the disk. For this purpose, each of 

20 bitmaps 50 and 52 is divided into ranges 58, 60, 62 and 64. 
Each range is effectively treated as a separate cache for 
purposes of tracking data transfer to disk. For each range, 
one of bitmaps 50 and 52 is treated as the current bitmap, 
in which CU 30 marks the appropriate bits when data are 

25 written to cache on the other subsystem, while the other 
bitmap is treated as the old bitmap, as described below. 
Although four ranges are shown in Fig. 3, cache 32 may 
alternatively be divided into a larger or smaller number of 
ranges for these purposes. 
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Fig. 4 is a flow chart that schematically illustrates a 
method used in writing data from host 26 to storage system 
20, in accordance with an embodiment of the present 
invention. The method is invoked when host 26 writes data 
5 over link 28 to the primary storage subsystem, i.e., 
subsystem 22 in the present example, at a host writing step 
60. Upon receiving the data, CU 30 of subsystem 22 places 
the data in cache 32, at a data caching step 62. CU 30 
determines the track or tracks in which the data are to be 

10 stored on disks 34 in subsystems 22 and 24, and marks the 
corresponding bits 54 in the current bitmap 50 or 52. (As 
noted above, the bitmaps are just one example of a data 
structure that can be used to keep a record of the cache 
status, and each bit may alternatively correspond to a data 

15 element that is larger or smaller than a single track on the 
disk.) CU 30 of subsystem 22 then writes the data to 
subsystem 24 via link 36, at a data copying step 64. 

CU 30 of secondary storage subsystem 24 receives the 
data over link 36, at a secondary receiving step 66. The CU 

2 0 of subsystem 24 places the data in its cache 32, and marks 
the bits in its bitmap 50 or 52 that correspond to the 
tracks for which the data are destined. Marked bits 56 in 
the bitmap held by secondary storage subsystem 24 indicate 
that primary storage subsystem 22 may have data in its cache 

25 that have not yet been written to the corresponding tracks 
on disk 34 of subsystem 22. After writing the data to cache 
32, CU 30 of subsystem 24 sends an acknowledgment over link 
36 to subsystem 22. Upon receiving the acknowledgment, CU 
3 0 of subsystem 22 signals host 26, at an acknowledgment 
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step 26, to acknowledge to the host operating system that 
the write operation was successfully completed . The 
acknowledgment is issued to host 26 independently of 
operations carried out on subsystems 22 and 24 to store the 
5 cached data to disks 34. Thus, the acknowledgment may 
typically be issued while the data are still in the volatile 
cache and before the data have actually been stored on disks 
34 or any other non-volatile media. 

Once data have been written to cache 32, each CU 30 

10 proceeds to transfer the data to disk 34. After a given 
track or range of tracks has been hardened in this manner on 
one of the storage subsystems, the CU notifies the other 
storage subsystem, which then clears the corresponding bits 
in its old bitmap. The notification preferably refers to a 

15 range of tracks, rather than just a single track, since 
sending notifications too frequently creates substantial 
overhead traffic on link 36 between subsystems 22 and 24. 
Some methods that can be used to perform data hardening and 
to convey these "hardening notifications" efficiently are 

2 0 described hereinbelow. When the CU of one subsystem is 
notified that a given set of tracks has been hardened on the 
other subsystem, it clears the ■ corresponding marked bits 56 
on the old bitmap. In the meanwhile, as the CU receives new 
write data (at step 62 or 66 above) , it marks the 

25 corresponding bits in the current bitmap. A logical "OR" of 
the current and old bitmaps held by the CU in each of 
subsystems 22 and 24 then gives a map of all the tracks 
containing data that may not yet have been hardened on the 
other subsystem. 
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Fig. 5 is a flow chart that schematically illustrates 
one method for tracking data hardening in system 20, in 
accordance with an embodiment of the present invention. 
This method is based on performing a destage scan, whereby 
CU 30 of secondary subsystem 24- periodically scans its cache 
32 (or scans a part of the cache corresponding to one of 
range 58, 60, 62 or 64) and writes all unhardened data to 
disk 34 Secondary storage subsystem 24 notifies primary 
storage subsystem 22 as the secondary subsystem destages 
each range. The identical method may be used to notify the 
secondary subsystem of a destage scan on the primary 
subsystem. Typically, the destage scan takes place at 
predetermined intervals or, alternatively or additionally, 
when CU 30 determines that the amount of unhardened data in 
a certain range of the cache (which may include the entire 
cache) is greater than some predetermined threshold. Note 
that in between these destaging operations, CU 30 may 
continue hardening data intermittently according to other 
criteria, as is known in the art. 

Before beginning the destage scan, CU 30 of subsystem 
24 sends a message over link 36 to subsystem 22 to indicate 
that the scan has started, at a starting step 70. The 
message indicates the range of the cache that is to be 
destaged. By way of example, let us assume that the destage 
scan is to be applied to range 58. The range may 
alternatively include the entire cache. Upon receiving the 
message, CU 30 of subsystem 22 saves its current bitmap of 
range 58 (in which it has marked the tracks for which data 
have been written to subsystem 24 up to this point) as the 
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r Q a+ - an old bitmap saving step 72. 
old bitmap of range 58, at an oia dil. v , 

Referring to Fig. 3, let us assume that bitmap 50 has been 
in use up to this point as the current bitmap for range 58, 
and includes marked bits 56. Range 58 of bitmap 50 is now 
saved as the old bitmap. Any previously- saved old bitmap of 
range 58 is discarded. From this point forth, CU 30 of 
subsystem 22 uses bitmap 52 as the current bitmap for range 
58 so that any tracks to which new data are written to 
cache in range 58 will now be recorded in bitmap 52. CU 30 
of subsystem 22 then returns an acknowledgment to subsystem 
24 at an acknowledgment step 74. 

' upon receiving the acknowledgment, CU 30 of subsystem 
2 4 begins its destage scan of range 58, at a destaging step 
76 When destaging of the entire range is finished, CU 30 
of ' subsystem 24 sends another message to subsystem 22 , 
indicating that the scan has been completed, at a completron 
message step 78. Upon receiving the message CU 30 o 
subS ystem 22 clears all the bits 54 in range 58 of brtmap 50 
(the old bitmap) , at a bitmap clearing step 80. 

Range 64 in Fig. 3 shows an example of an old brtmap 
range that has been cleared in bitmap 50, following which 
new bits 56 are marked in bitmap 52. As another example, in 
range 62, a destage scan has started with respect to o Id 
bitmap 50, but has not yet been completed, so that some bits 

« „f hitman 50 are still marked. Meanwhile, as 
in range 62 of bitmap su ai 

new data are written during the destage scan, CU 30 

, -i« -r^nn^ 62 of the new 

subsystem 22 has begun to mark bits in range 

current bitmap 52. Although in these examples, for the sake 

of clarity and convenience, bitmap 50 is referred to as the 
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Hit-man 52 is referred to as the current 
old bitmap, while bitmap 52 is 

roips of "old" and "current" 
bitmap, in actual operation the roles ot 

bitmap toggle back and forth between the two bitmaps. 

To illustrate failure recovery in system 20, let us 

, hat subsvs tem 22 has failed, while subsystem 24 
assume that suDsystem ^ 

remains operational. At the time of failure CU 3° ° 
subsystem 24 held bitmaps 50 and 52. The union (logic OR 
of all the bits that are marked in the two bitmaps indicates 

* j _ <„ r-arhe 32 of subsystem 22 that may 
all the tracks of data in cache 32 ot s y 

, have contained unhardened data at the time of failure. I 
f act. some of these tracks may aiready have been ar ed 
aithough notification did not reach subsystem 2* It an be 
said with certainty, however, that there are no tracks that 
ve not been hardened on subsystem 22 whose correspond! g 
5 hits are not marked in the union of bitmaps 50 and 52 he 
by CU 30 on subsystem 24. In other words, the union of 

„„ ■ suoerset of all the unhardened 
these bitmaps represents a superset 

tracks on subsystem 22. , to 

. „ f failure system 20 may "failover to 
At the time or raiiure, 

f}ia , subsv stem 24 now serves as the primary 
20 subsystem 24, so that subsystem 

«- — THurthe 1 ; : l^Illy by 

subsystem 24 maintains a further the 

m arking additional bits in the united bitmap, indicating th 
tracks to which data are written while subsystem 22 is 

25 of service. ^ cu 30 

When subsystem 22 is reaay u 
in subsystem 22 performs initial machine loading, as s 
kn own in the art, and then asks subsystem 24 or ^ a 
update. CU 30 of subsystem 24 then transfers to subsystem 
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22 the data in all the tracks that are marked in the united 
bitmap. once the transfer is ccmplete, subsystem 22 may 
resume operation as the primary storage subsystem. 

Alternatively, other methods may be used for clearing 
bits in bitmaps 50 and 52, besides the periodic destaging 
me thod shown in Fig. 5. The method described above is 

• i~ nn t-hat it allows each subsystem 22 
advantageously simple, in that it an 

i_ ■ „ cn anrl 52 (or other data 
and 24 to maintain bitmaps 50 and b2 

i ,n r h resoect to the unhardened tracks on 
structures) only with respect to 

the other system. In some alternative embodiments of the 
present invention, control units 30 on both of subsystems 2 
and 24 maintain similar bitmaps 50 and 52 with respect to 
the data tracks that have been copied from subsystem 22 to 
subsystem 24. In other words, each subsystem maintains two 
sets of bitmaps: a first set indicating the tracks that may 
not yet have been hardened on the other subsystem, and a 

tk , ff ,sK set of bitmaps maintained by 
second set mirroring the first set oi =c . t 

the other subsystem. Because the bitmaps in the *-st s et 
m aintained by each subsystem are generated as a result o 
usages received from the other subsystem (as de cribed 
above with reference to Fig. 4), it is a straightforward 
matter for each subsystem to build and maintain its second 
set of bitmaps based on the messages that it transmits to 

the other subsystem. •i,„ t , h ,. fe s 
Fig 6 is a flow chart that schematically illustrates 
one such method, in accordance with an embodiment of the 

t»t c aaa in consider tracking of data 
rvresent invention. We again conbiu 

v. ^ em ?4 (although the method may likewise 
hardening on subsystem 24 (aitnougn 

be applied to hardening of data on subsystem 22) . Again 
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taking region 58 as an example, as subsystem 22 conveys data 
to subsystem 24, both subsystems mark bits 56 in region 58 
of bitmap 50 (in the above-mentioned "first set" of bitmaps 
maintained by subsystem 22 and in the "second set" of mirror 
5 bitmaps maintained by subsystem 24) to indicate the tracks 
that are to be hardened. Subsystem 24 increments a counter 
N for each new bit that is marked in range 58 (and similarly 
in ranges 60, 62 and 64) . When subsystem 24 hardens a track 
in region 58, it clears the bit and decrements N, without 

10 notifying subsystem 22. Subsystem 24 may choose the tracks 
to harden using any suitable criterion, such as hardening 
least -recently-used tracks. There is no need for subsystem 
24 to perform an orderly destaging of an entire region, as 
in the method of Fig. 5. 

15 Periodically, subsystem 24 sends a message to subsystem 

22 to indicate that it is about to switch to a new bitmap 
for a given region, say region 58, at a toggle notification 
step 90. Subsystem 24 then waits for subsystem 22 to 
acknowledge the message, at an acknowledgment step 92. 

2 0 Region 58 of bitmap 50, the old bitmap of the region, is 
then locked in both subsystems 22 and 24, and all subsequent 
data writes to the region are marked in bitmap 52, the new 
bitmap. A counter M for region 58 of the old bitmap is 
initialized to the value N, i.e., to the number of bits that 

25 are set in this region of the old bitmap, at a counter 
setting step 94. A new counter N for region 58 in bitmap 52 
is set initially to zero and is then incremented and 
decremented as described above. 
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subsystem 24 meanwhiie continue, to harden the tracks 

ca r^f bitmap 50 as containing 
that are marked in region 58 of bitmap 

. vet to be hardened, as well as to 

(-ached data that are yet ^ >~> 

re le further data fro. subsystem 22 to write to the 
traclcs in this region. When subsystem 24 receives data to 
tracks th g determine whether 

be written to a given track, it ci 

this track is already marked in the new bitmap of region 58 , 
t a new track write checking step 96. « « -indicating 

t-o this track since the 
that there has already been a write to this tr 

last bitmap toggle), no further, marking of either the old 
last bitm p yy nofc marked in 

new bitmap is required. If this 

cnhcjvstem 24 marks the tracK in uic 
the new bitmap, however, subsystem ^ 

new bitmap, at a track marking step ,8. 24 hen 

ch eoks Whether this track is marked in the old ^ ^ » 

mr> Tf so subsystem 24 aecremem-o 
old track checking step 100. If so, y 

oia uj-a^ Otherwise, the 

M, at a counter decrementation step 102. 

counter is not decremented. 

Meanwhile, whenever subsystem 24 hardens a data rack 
it checks to determine whether this track is marked m the 
lt checks to hardness checking 

new bitmap of region 58, at a ne 

^ if so subsystem 24 clears the corresponding bit 
step 104. If so, sub y ^ ^ ^ ^ 

in the new bitmap, at a track clea g ^ 
dementing ^ £ ^ In this case 

. "r^ttn:^. count. . w as ^ 

' lamented with respect to this t^ *t 

mar -lcpd in the new bitmap at step 
the track was marked m t hardened is not 

•p n, e track that has been haraeneu 
other hand, if the track t determine 
marked in the new bitmap, subsystem 24 checks 
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Aether the track is marked * the oid bitmap of »£» 
at step 100. If so, subsystem 24 decrements M at step 102 
subsystem 2, checks to determine when M reaches er 
at a termination testing step 110. When H drops to zero it 

. . ii the tr acks that were previously marked in the 
means that all the tra ^ ^ ^ 

old bitmap have now either oeen iia 

old bitm p , s ^ ionger needed by 

new bitmap, so that the oia * ,. f , oa <,„ b svstem 22 

u - m 99 Therefore, subsystem 24 notifies subsystem 
subsystem 22. Ttiereroi f ld bit tnap 

that it should clear all the bits m region 58 of P 
0 as at step 80 in the method of Fig- 5. The process of 
F ig. 6 toggles back and forth between bitmaps 50 and 52, as 

described above. maintaining a 

As another alternative, again based ^ 

S econd set o. ^mirror bitmaps as descri ^ ^ 
to the method of Fig. =. *? . al . pn 

iectiveiy hardens certain tracks in a given region ^ at St 
7S . rather than destaging the entire region. ^ 
may choose the tracks to harden by comparing its *W 

_ = hi unaD of unhardened tracks in 
bitmap to a bitmap or ks that are marked 

^ « Qiibavstem 24 hardens all tracks tn<* 
cache 32. Subsystem ca _ he Upon finishing 

in both the old bitmap and the local cache. P 
rhis step subsystem 24 notifies subsystem 22 that it 
this step, ouwojr rrupqtion (since 

T«r its old bitmap of the region in question 
now clear its old P n ^ been 

all the data written to the tracks in 

= hardened on at least one of the subsystems) . 

^ 7s noted above, although certain configuration 

. ■„ articular data mirroring protocols 
sys tem 20 and ^ ^^^^ the principles of 
are described above in o rd er to g ^ 

the present invention, these pi 
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applied in other system configurations and using other 
protocols, as will be apparent to those skilled in the art_ 
Xt will thus be appreciated that the embodiments described 
ahove are cited by way of example, and that the present 
invention is not limited to what has been particularly shown 
and described hereinabove. Rather, the scope of the presen 
invention includes both combinations and subcombinations of 
the various features described hereinabove, as well as 
variations and modifications thereof which would occur to 
persons skilled in the art upon reading the foregoing 
description and which are not disclosed in the prior art. 
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